8 research outputs found

    Some Commonly Used Speech Feature Extraction Algorithms

    Get PDF
    Speech is a complex naturally acquired human motor ability. It is characterized in adults with the production of about 14 different sounds per second via the harmonized actions of roughly 100 muscles. Speaker recognition is the capability of a software or hardware to receive speech signal, identify the speaker present in the speech signal and recognize the speaker afterwards. Feature extraction is accomplished by changing the speech waveform to a form of parametric representation at a relatively minimized data rate for subsequent processing and analysis. Therefore, acceptable classification is derived from excellent and quality features. Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), Line Spectral Frequencies (LSF), Discrete Wavelet Transform (DWT) and Perceptual Linear Prediction (PLP) are the speech feature extraction techniques that were discussed in these chapter. These methods have been tested in a wide variety of applications, giving them high level of reliability and acceptability. Researchers have made several modifications to the above discussed techniques to make them less susceptible to noise, more robust and consume less time. In conclusion, none of the methods is superior to the other, the area of application would determine which method to select

    Level Crossing Control: A Novel Method Using Sound Recognition

    Get PDF
    The level crossing (LX) or railway crossing being an intersection between a public road and a railway line, can be controlled actively or passively. Sound recognition can be used to actively control a level crossing. A system is proposed in this study for the use of sound to control a LX. This proposed system uses Mel Frequency Cepstral Coefficient (MFCC) as feature extractor, and Recurrent Neural Network (RNN) as classifier. The proposed system has shown a great potential that could be harnessed to contribute to the reduction in the loss of lives and properties at the LX

    LPC and its derivatives for stuttered speech recognition

    Get PDF
    Stuttering or stammering is disruptions in the normal flow of speech by dysfluencies, which can be repetitions or prolongations of phoneme or syllable. Stuttering cannot be permanently cured, though it may go into remission or stutterers can learn to shape their speech into fluent speech with an appropriate speech pathology treatment. Linear Prediction Coefficient (LPC), Linear Prediction Cepstral Coefficient (LPCC) and Line Spectral Frequency (LSF) were used for the feature extraction, while Multilayer Perceptron (MLP) was used as the classifier. The samples used were obtained from UCLASS (University College London Archive of Stuttered Speech) release 1. The LPCC-MLP system had the highest overall sensitivity, precision and the lowest overall misclassification rate. LPCC-MLP system had challenges with F3, the sensitivity of the system to F3 was negligible, similarly, the precision was moderate and the misclassification rate was negligible, but above 10%

    RECEIVER OPERATING CHARACTERISTICS MEASURE FOR THE RECOGNITION OF STUTTERING DYSFLUENCIES USING LINE SPECTRAL FREQUENCIES

    Get PDF
    Stuttering is a motor-speech disorder, having common features with other motor control disorders such as dystonia, Parkinsonโ€™s disease and Touretteโ€™s syndrome. Stuttering results from complex interactions between factors such as motor, language, emotional and genetic. This study used Line Spectral Frequency (LSF) for the feature extraction, while using three classifiers for the identification purpose, Multilayer Perceptron (MLP), Recurrent Neural Network (RNN) and Radial Basis Function (RBF). The UCLASS (University College London Archive of Stuttered Speech) release 1 was used as database in this research. These recordings were from people of ages 12y11m to 19y5m, who were referred to clinics in London for assessment of their stuttering. The performance metrics used for interpreting the results are sensitivity, accuracy, precision and misclassification rate. Only M1 and M2 had below 100% sensitivity for RBF. The sensitivity of M1 was found to be between 40 & 60%, therefore categorized as moderate, while that of M2 falls between 60 & 80%, classed as substantial. Overall, RBF outperforms the two other classifiers, MLP and RNN for all the performance metrics considered

    A comparative study of the difference between MFCC and PLP in the recognition of sound

    No full text
    Sound is one of the most important tools for classification, recognition and identification of objects in the environment. The raw sound signal is complex and is not suitable to be feed as input to the sound identification system; hence the need for a good front-end arises. The identification rate using the RNN classifier and MFCC is 72.7%, 73.7%, 78.9% 57.1% and 58.3% for aircraft, car, rain, thunder and train respectively as compared to what was obtained by using MLP. 31.6%, 19.4%, 18.5%, 38.0% and 26.4% decline is achieved for aircraft, car, rain, thunder and train respectively when comparing between MLP and RNN for MFCC. As far as sound recognition using the input used in this experiment is concerned, MFCC outperforms PLP and MFCC and PLP using MLP as classifier

    Level Crossing Control: A Novel Method Using Sound Recognition

    No full text

    Receiver operating characteristics measure for the recognition of stuttering dysfluencies using line spectral frequencies

    No full text
    Stuttering is a motor-speech disorder that has features in common with other motor control disorders such as dystonia, Parkinsonโ€™s disease, and Touretteโ€™s syndrome. Stuttering results from complex interactions between factors such as motor, language, emotions, and genetic systems. This study used Line Spectral Frequency (LSF) for feature extraction, while using three classifiers for the identification purpose, Multilayer Perceptron (MLP), Recurrent Neural Network (RNN) and Radial Basis Function (RBF). The UCLASS (University College London Archive of Stuttered Speech) release 1 was used as the database in this research. These recordings were from people of ages ranging from 12y11m to 19y5m, who were referred to clinics in London for assessment of their stuttering. The performance metrics used for interpreting the results are sensitivity, accuracy, precision, and misclassification rate. Only M1 and M2 had below 100% sensitivity for RBF. The sensitivity of M1 was found to be between 40% & 60%, therefore categorized as moderate, while that of M2 falls between 60% & 80%, classed as substantial. Overall, RBF outperforms the two other classifiers, MLP and RNN for all the performance metrics considered. Gagap adalah gangguan motor pertuturan, mempunyai ciri-ciri yang sama dengan lain-lain gangguan kawalan motor seperti dystonia, penyakit Parkinson dan sindrom Tourette. Keputusan kegagapan daripada interaksi kompleks antara faktor-faktor seperti motor, bahasa, emosi dan genetik. Kajian ini menggunakan Frekuensi Line spektral (LSF) untuk pengekstrakan ciri, semasa menggunakan tiga penjodoh untuk tujuan mengenal pasti, Multilayer Perceptron (MLP), Rangkaian Neural Berulang (RNN) dan Radial Asas Fungsi (RBF). The UCLASS (University College London Arkib Stuttered Ucapan) melepaskan 1 digunakan sebagai pangkalan data dalam kajian ini. Ini rakaman adalah dari orang-orang peringkat umur 12y11m untuk 19y5m, yang dirujuk kepada klinik di London untuk penilaian kegagapan mereka. Metrik prestasi yang digunakan untuk mentafsir keputusan yang sensitif, ketepatan, ketepatan dan kadar misclassification. Hanya M1 dan M2 mempunyai di bawah 100% kepekaan untuk RBF. Kepekaan M1 didapati antara 40% & 60%, oleh itu dikategorikan sebagai sederhana, manakala M2 jatuh antara 60% & 80%, dikelaskan sebagai besar. Secara keseluruhan, RBF melebihi performa dua penjodoh lain, MLP dan RNN untuk semua metrik prestasi dipertimbangkan
    corecore